Information based data anonymization for classification utility
نویسندگان
چکیده
Article history: Received 27 September 2010 Received in revised form 10 April 2011 Accepted 5 July 2011 Available online 22 July 2011 Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building classification models. In this paper, we argue that data generalization in anonymization should be determined by the classification capability of data rather than the privacy requirement. Wemake use of mutual information for measuring classification capability for generalization, and propose two k-anonymity algorithms to produce anonymized tables for building accurate classification models. The algorithms generalize attributes to maximize the classification capability, and then suppress values by a privacy requirement k (IACk) or distributional constraints (IACc). Experimental results show that algorithm IACk supports more accurate classification models and is faster than a benchmark utility-aware data anonymization algorithm. © 2011 Elsevier B.V. All rights reserved.
منابع مشابه
An Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling
In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...
متن کاملارایه یک روش جدید انتشار دادهها با حفظ محرمانگی با هدف بهبود دقّت طبقهبندی روی دادههای گمنام
Data collection and storage has been facilitated by the growth in electronic services, and has led to recording vast amounts of personal information in public and private organizations databases. These records often include sensitive personal information (such as income and diseases) and must be covered from others access. But in some cases, mining the data and extraction of knowledge from thes...
متن کاملUtility-preserving anonymization for health data publishing
BACKGROUND Publishing raw electronic health records (EHRs) may be considered as a breach of the privacy of individuals because they usually contain sensitive information. A common practice for the privacy-preserving data publishing is to anonymize the data before publishing, and thus satisfy privacy models such as k-anonymity. Among various anonymization techniques, generalization is the most c...
متن کاملEvaluating the Utility of Single Field Anonymization Polices by the IDS Metric : Towards measuring the trade off between Utility and Security
Anonymization is the process of removing or hiding sensitive information in logs. Anonymization allows organizations to share network logs while not exposing sensitive information. However, there is an inherent trade off between the amount of information revealed in the log and the usefulness of the log to the client (the utility of a log). There are many anonymization techniques, and there are...
متن کاملSecGraph: A Uniform and Open-source Evaluation System for Graph Data Anonymization and De-anonymization
In this paper, we analyze and systematize the state-ofthe-art graph data privacy and utility techniques. Specifically, we propose and develop SecGraph (available at [1]), a uniform and open-source Secure Graph data sharing/publishing system. In SecGraph, we systematically study, implement, and evaluate 11 graph data anonymization algorithms, 19 data utility metrics, and 15 modern Structure-base...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Data Knowl. Eng.
دوره 70 شماره
صفحات -
تاریخ انتشار 2011